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Abstract — Approximating marginals of a graphical model is 
one of the fundamental problems in the theory of networks. In 
a recent paper a method was shown to construct a variational 
free energy such that the linear response estimates, and maximum 
entropy estimates (for beliefs) are in agreement, with implications 
for direct and inverse Ising problems |1|. In this paper we 
demonstrate an extension of that method, incorporating new in- 
formation from the response matrix, and we recover the adaptive- 
TAP equations as the first order approximation |2|. The method 
is flexible with respect to applications of the cluster variational 
method, special cases of this method include Naive Mean Field 
(NMF) and Bethe. We demonstrate that the new framework 
improves estimation of marginals by orders of magnitude over 
standard implementations in the weak coupling limit. Beyond the 
weakly coupled regime we show there is an improvement in a 
model where the NMF and Bethe approximations are known to 
be poor for reasons of frustration and short loops. 

I. Introduction 

Bethe and Naive Mean Field (NMF) are two of the most 
used variational methods, and can be considered special cases 
of the cluster variational method (CVM) J5), 0], lEJ. Fast and 
provably convergent methods are known for the minimization 
of the CVM free energy [6 1, and systematic expansion methods 
about minima have been shown Q, HI. Generalizations and 
convex approximations to CVM have allowed for the devel- 
opment of fast and secure inference methods J5], ifTOl . Linear 
response (LR) is often applied to the variational framework to 
determine accurately pair correlation estimates 0, ifPTl . [12|, 

In studying graphical models one is often interested in the 
estimation of probability distributions over local variables, 
and pair correlations; these can for example form the basis 
for decimation methods or moment matching algorithms [14], 
US], 0. In this paper we develop an extension of the 
standard method for estimating marginal probabilities and 
pair-correlations in CVM, such that the variational parame- 
ters (beliefs) are self-consistent with the LR estimates. Our 
model improves over standard implementations on arbitrary 
graphs if couplings are weak. From the NMF framework 
we recover the adaptive-TAP equations for pairwise spin 
models (2). From the Bethe approximation, we recover the 
Sessak-Monasson expression for correlation estimation from 
a variational framework lfl6l . We apply the method to the 
homogeneous triangular lattice model: Bethe and NMF are 
known to perform poorly on this model due to the presence 



of short loops and frustration. For brevity we will focus 
only on binary variables (spins), pairwise couplings and two 
simple CVM approximations; but the principle is more general 
and might be combined with some of the aforementioned 
expansions and algorithmic methods. 

A. Cluster variational method and linear response 

Consider a model defined over N spin variables {ct; = ±1}, 
interacting through a symmetric coupling matrix J and with 
external fields H. We can write the Hamilonian (Cost func- 
tion); 

s are subsets, and / defines the set of non-zero couplings, for 
the pairwise model / = : Jij ^ 0}. The exact free 

energy (cumulant generating function) of this model is 

F = -logTr[exp(-ft(a))] (2) 

Tr[-] is a sum over the states in the enclosed expression. From 
the free energy one can calculate quantities of interest such as 
the magnetizations and pair connected correlations by linear 
response. Consider the connected correlation over a subset of 
variables s, it would be calculated as 
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Naturally the application of linear response is not restricted to 
the exact free energy, using an approximate free energy one 
obtains pseudo statistics. In either scenario one can construct 
the marginal (pseudo) probabilities, for example 



lLR 



Unfortunately the evaluation of the free energy is computa- 
tionally intractable for many networks of interest, as such 
we cannot construct the marginals (@J by method (0 without 
approximations to the free energy. 

The CVM provides such an approximation, based on a 
weighted sum of local entropy and energy contributions. The 
variational entropy and energy approximations are defined 



S(b) = -^c K Tr[M^)logM^)] 



(5) 
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R are regions (subsets of variables), 6# are beliefs (approxi- 
mate marginal probabilities) over the set of variables <jr (we 
henceforth omit the arguments bji — &r(cr) for brevity), and 
cr are counting numbers. 

The theory of junction trees provides a justification for the 
selection of regions and counting numbers [5], if one selects 
regions according to a junction tree then one can recover the 
exact free energy by a constrained minimization 



Fcvm = min{E(b) - S(b)} , 

b 

subject to < bft < 1 and constraints 



(7) 



= Tr[b R ] , Vi? ; (8) 
= Tr w , [b R ] - b w , Vi?', R:R'cR; 

where Tr\ is a trace over all variables excluding those listed in 
the subscript. The beliefs recovered for a correct region selec- 
tion are exactly the marginal probabilities, and are consistent 
with those determined by linear response b R = bji . 

The regions prescribed by a junction tree have a maximum 
size that depends on the graph width, if this topological 
property is small then CVM can be an efficient way to 
calculate the free energy. If the graph is uncorrelated ( J = 0) 
then the NMF approximation is exact, with cr = 1 for single 
variable regions and cr = for all other regions. For purposes 
of evaluating the energy © we take b s = Yiies^i m NMF. 
If the graph is a tree (or forest) the Bethe approximation is 
exact, with edge regions for every element in i, in addition 
to all vertex regions. In the Bethe case: c s = 1 ,Vs G I; 

d = l- Y,Rci:i£R c R> and C R = otherwise. 

Unfortunately, junction trees are also impractical in general. 
Typically the width of the graph is large, requiring large 
regions for an exact solution, so that the evaluation of the 
entropy (0 is impractical. One is therefore interested in 
approximations; fortunately NMF and Bethe are found to 
be good, or asymptotically exact in many circumstances. 
Given that we resort to these methods in cases where the 
method is not exact, how should one construct the marginal 
probabilities? Two options exist for the regions exploited in 
the approximation: one can use the maximum entropy estimate 
Pr k, bR\ or one can use the linear response estimtate 
Pr ks b R R about the minima of Fcvm- In general these 
estimates differ; one measure of the quality of the variational 
approach is the amount of agreement between these values. 
This paper discusses a modification to Fcvm that allows for 
exact agreement. 

Let us parameterize the beliefs over single variables in a 
manner comparable to (|4), using the set of magnetization (Ci), 
and in the case of Bethe symmetric pair correlation parameters 
iCij = Cji) 



(l + Ci<Tj) Ci.OiGj 

bi = - - ; hj = hbj H 



(9) 



where CV, = for NMF. By this method the constraints (HJ are 
made redundant and we have an unconstrained minimization 
problem, subject to a parameter range < 6r < 1. We exclude 



the possibility of boundary values {0, 1} since to apply linear 
response we will assume a minima in which all parameters 
can fluctuate. 

In this paper we consider the following modification to the 
entropy approximation: we introduce, in the case of NMF the 
constraint that the self-response and magnetization agree as 
per the exact free energy 

Xii = 1 - Cf , Vi . (10) 

We can introduce a Lagrange multiplier in the standard form 
to write the entropy approximation for NMF 



s x = E Tr & lo s^] - E A * ((! - °i) 



Xii) /2 



(11) 



Within the Bethe approximation agreement between bi 



bff requires the additional constraint 
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(12) 



the entropy approximation for Bethe becomes 
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(13) 

The entropies presented are a generalization of those discussed 
in (TJ, where only the off-diagonal constraints were consid- 
ered (Aj = 0, Vi). 

B. Saddle-point equations and reaction terms 

A minima of the free energy requires that the derivatives 
with respect to the variational parameters are zero. The deriva- 
tive with respect to Ci is 

= atanh(C 4 ) - Hi - V] J i: 

j&i) 

where L { = Lf = for NMF, and for Bethe 

'bi. 
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For the Bethe method we must also consider the derivative 
with respect to Cy 
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(16) 



Note that both the entropic term XiCi and the leading 
order of Li, are reaction terms (proportional to Ci); for fully 
connected models these reaction terms are well understood in 
the large system limit |2). When A is taken to be non-zero 
and fixed by linear response, we find that quite generally we 
recover the Onsager reaction term, as later discussed. When 
the constraint ( fTOb is not enforced A^ = 0, in applying the 
Bethe approximation the Onsager reaction term is recovered 
for the case of independent identically distributed couplings. 

When considering variation of the free energy we will treat 
both x an d A as fixed external parameters, the variation is 
restricted to C. The Hessian, with components 

d 2 F CV M(b) 



dC s dCs 



(17) 



is required to be positive definite at the minima. 

Supposing C = C* describes the minimizing arguments for 
{H, J}, in response to a small variation in the fields H + 8H 
the new minima C = C* + SC can be determined from the 
quadratic order expansion of the free energy 

F?(H + 5H,J) = mini F*(H, J) + SCQ x SC/2 



sc I 



(18) 



for NMF (X = N) or Bethe (X = B); this will be the basis 
for constructing the linear response identities. 

1) NMF: In the case of NMF the components of the 
Hessian are 
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The argmin for NMF is 

6Ci = J2[(Q 



N\ 



(19) 



(20) 



Now, the linear response identity in the limit SH — > allows 
us to decompose SC as a sum of perturbations on individual 
fields at leading order 

5C i =Y i XiJH z . (21) 



As such 



Xi,j = [(Q N )~\j = - J)-\ 3 ■ (22) 



We introduce notation $ to denote the entropic (approxi- 
mate) part of Q, separated from the energetic (exact) part. This 
matrix gives us estimates for all pair correlations, as well as 
those within regions. The constraint (TTOb is 
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(23) 



We have a closed set of equations for {C, A} in (TBI and 
Interestingly these equations are exactly those that define 
the adaptive TAP. It was shown by Opper and Winther that 
a solution to the above equations does indeed reproduce the 
Onsager reaction term for correlated and uncorrelated distribu- 
tions of J (and without prior knowledge of the statistics) J2). 

2) Bethe: To describe the Hessian at the Bethe level vari- 
ation of the pair correlation parameters must be considered 
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(24) 



If we break the vector SC into two parts, one describing single 
variable variation SC 1 = {SCi] and one describing pairwise 



variation SC 2 = {SCij} we can determine the minimizing 
arguments of ( fT8l from 

= QW8C 1 + Q {2) 8C 2 ; (25) 
SH = Q^SC 1 + [Q^fsC 2 . (26) 

Eliminating 8C 2 we can proceed as for NMF 

x = [Q {1) [Q (2 ' 1) ] T [Q (2) f 1 Q (2 ' 1) }~ 1 = 1® B Jf 1 ■ (27) 

Calculating the off-diagonal component of $y where 
G /, we find 



Ji 



where the expressions for i ^ j and i = j are 
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(28) 

,(29) 
■(30) 



Applying the constrained method with (fT2l . we find that for 
i ^ j in (l28l we recover the Sessak-Monasson expression for 
calculation of correlations |[T6l . 

II. Implementation 

A. Weak coupling limit 

When couplings (and consequently Cij) are small in 
absolute value the solution to the free energy can be found 
as an expansion about the decoupled case. Using the exact 
entropy terms L* and $* one can determine as expansions in 
{ J} the corresponding solutions C*,x* and A* (A* = in the 
exact case). 

For J = Bethe and NMF are exact. We can make an 
expansion of our expressions assuming C = C* + SC and 
A = A* + SX, with both 5C and 5 A small. We linearize 
our equations, all terms are evaluated for the exact solution 
{C*,A*,x*} in the following expressions. From ([141 



= E Q& 5C > + ( L ? - L ^ - sx * c * 



(31) 



For the Bethe method we require, from ([TBI 

(! : : (If' - / +<5A 



JC S 



(32) 



s=i,j,ij 

which in the constrained case ( fT2l defines SXij. Since SXij 
enters no other equations it does not influence the error on C. 

We can expand <f> about the exact result, distinguishing the 
on and off-diagonal terms 



<f> x . = (® x 5 SX 



^ d$ x 
X ^S( 



The constraints ( TTOb and ( fT2l dictate respectively 

- 2C l sc l = -{ x $ x xh ; s Xij = -[x^ x xh 



(33) 



(34) 



where 8\ij = 5Cij in the latter case, otherwise {A; = 0} or 
{Ay = 0,Sxij 7^ SCij} if the respective constraints are not 
introduced. 

The errors in responses Li, as well as on and off-diagonal 
elements of $ are significant in determining the errors. At 
leading order (=) in the weak coupling limit (J small) 

2J 3 

= 24^, + -^ (35) 

+ 2 ( Jikj + Jjki)^J — ZJfkJjkJjk (36) 

where ti = tanh( Ji), T,; = 1 — tf and we define Jy-fc = 
JikJjkhtjTk; and 

Lf-L* = UDf ■ - $1 = Df . (37) 

For the NMF and Bethe methods we define respectively 

D i = - X! ' D i = 2 X! JjkJi 3 JikTjT k . 

(38) 

In the Bethe method where Cy ^ xy tne error on Cy is 
dominated by 

/•f; // • J ^J,kT k . (39) 

In d35l ) and ( f36l > we demonstrate also the leading order diagram 
relevant for high temperature at 0((3 3 ) and 0((3 5 ) respectively. 

We calculate errors for both the weak coupling (small J) 
and high temperature (J and H are 0(f3)) cases solving the 
linearized equations. We summarise the consequences for the 
error in Ci, Xi^j an d Cij according to constraints introduced 
(left label in list). For NMF errors are 

From (ED and (O 5C % is determined as 0( J 2 , /3 3 ). The 
response error Sxtj is 0(J 2 ,/3 3 ). 
( [Tol l We find = Df, removing the most significant 
source of error in SCi, the error on the magnetization 
improves to 0( J 3 , /3 4 ), the error on Sxij remains limited 
to 0(J 2 , /3 3 ) by the error d35>. 

For Bethe errors are 

8C X and Sxij are 0(J 3 ,^ 4 ). 
( [Tol l We find SCi is improved to 0(J 4 ,/3 5 ), ^Xij remains 

0(J 3 ,/3 4 ). Error sources (|37t are improved, but ( l39l 

remains a significant constraint on accuracy of Sxij- 
( fT2l 5Ci remains 0(J 3 ,/3 4 ) but ^Xij is improved to 

0(J 4 ,/3 4 ). The errors on SC are made independent of 

d39l , but the error sources ((37) are unimproved. 
( 110112b The combined effect is to remove the most significant 

sources of error, both 5Ci and S\ij become 0(J 4 ,/3 5 ). 

The remaining error on Sxij is limited at leading order 

only by (l36l l. 

For Bethe introducing the constraint (ITZt always reduces the 
error on SCij, which is 0(J 2 ,(3 2 ) in the standard method. 




uniform coupling on triangular lattice, J 

Fig. 1. Full correlations estimates on nearest neighbors based on linear 
response b^ R , compared to the exact result (black think curve), and the 
parameters &ij determined for a standadrd implentation of the Bethe approxi- 
mation (thin red). Curves are labeled in the legend according to the constraints 
introduced. For negative J the new methods perform admirably compared to 
standard implementations. All methods perform poorly in the vicinity of the 
phase transition, the paramagnetic solutions of the new methods can be stable 
even beyond the true critical point J > 0.275, though performance is poor. 

B. Iterative scheme 

The non-convex nature of the constraints we are introducing 
makes algorithm development a challenge, but we can solve 
in general these equations for weak-coupling, with a naive 
iterative scheme 

C* +1 = tanh ^H t + Jijmj + X\Cl - L\ j . (40) 

If applying the constraint (fTOl l. we can simultaneously infer 

A* +1 -A*-^((l-(C7f) 2 )- X * 4 )$* l , (41) 

otherwise A,; = 0. Applying constraint ( fT2l . for the Bethe 
method, 

x% = [($* - pjy% ; CI+ 1 = x % • (42) 

with Ay fixed by ( fT6] i; otherwise Ay = and we fix 

b\j = argmin 6y {6y log 6y ./, , IV h,,rr,rr, : C^, C^} . (43) 

To fix bjj at fixed C\ and Cj is equivalent to fixing Cfj . 

The instantaneous mean field is used to update the magne- 
tization in (l40l i. a linear expansion of ( TTOb is used to deter- 
mine ( |4TT >. a naive iteration matching successively the linear 
responses is used in d42l . At large \J\ (I40l-d42l can be unstable 
individually or in combination, damping and annealing can be 
effective strategies to arrive at a solution for strong coupling. 
The procedure (143) is one of convex optimization and doesn't 
contribute to instability. 



C. Strong coupling regime experiments 

We consider a simple model the triangular lattice model 
with uniform couplings Jij = J and zero fields Hi = 
in the large system limit. This model is problematic for 
standard Bethe and NMF for several reasons: it involves short 
loops not accounted for by the region selection; there is a 
continuous symmetry breaking transition at J = 0.275 with 
associated long range correlations IfTTI ; for J < there is 
frustration; for J < there are Kosterlitz-Thouless transitions, 
but no symmetry breaking transitions |fj"8l , |[T9l . For these 
reasons Bethe and NMF estimates for b^ or can be 
poor. The solution can be found for our new methods by 
Fourier analysis. Figure Q] presents a comparison of methods. 
We present only the solution found continuously from J = 
by the iterative method, and we do not present the symmetry 
breaking solutions at J < 0, where they exist. 

For J > the paramagnetic ({Cj = 0}) solutions are, for 
small \ J\, in close agreement with the exact result. Amongst 
new methods all but the doubly constrained approximation 
(Bethe with ( fTOb and ( TT2l ) remain locally stable well beyond 
the true ferromagnetic transition point. The large J para- 
magnetic solutions do have some problems: the correlation 
estimate E[aicrj] becomes poor compared to the standard 
implementations that undergo symmetry breaking transitions; 
negative entropy can be found (when Cjj > 0.652); and finally 
the iterative method struggles to converge without strong 
damping. The doubly contrained method on the other hand 
has a Hessian that becomes singular (for the paramagnetic 
solution) at J = 0.16, in advance of even the standard NMF 
instability J = 1/6; with simple iterative methods we are 
unable to construct a magnetized solution. 

For J < 0, the frustrated regime, performance of the new 
methods is a clear improvement over standard implementa- 
tions. We find a similar pattern of results with respect to 
other components of \ (longer range correlations), and in 
testing lattices of finite size. An instability towards symmetry 
breaking on the tripartite sublattices causes the termination of 
the line Bethe ( fTOb for J < 0; a similar instability affects the 
standard NMF implementation. 

III. Conclusion 

The fundamental objects of the cluster variational methods 
are the beliefs, which are determined by maximum entropy. 
One would expect that these beliefs would be reproducible by 
linear response up to some small error, but for the standard 
method this is not true. The discrepancy provides information 
on global graph structure that we have exploited in the 
new approximation where max entropy and LR are made 
consistent. In this article we have required consistency with the 
quadratic order LR relations, which we expect are of greater 
importance than those of higher order. 

As part of our method we recover the Sessak-Monasson 
equation for pair correlations lTT6l . Furthermore, for the case 
of spins and working from the NMF framework we derive the 
adaptive-TAP equations [2]. The arguments by which these 
well known equations were originally derived is very different 



from our own, offering a point of comparison. In our derivation 
we note that we arrive at these results using simple region 
selection rules in the cluster variational framework, without 
assuming weak correlations; in the case of adaptive TAP and 
Sessak-Monasson we can improve performance by moving to 
higher order region approximations |Q] . 

Our framework is also very flexible with respect to the 
nature of the Hamiltonian; the result is easily generalized to 
other pairwise models, or with small modifications to multi- 
body interactions. We can also hope that some of the powerful 
expansion and algorithmic methods mentioned in the intro- 
duction will be made compatible. Application to the Inverse 
problem are also promising, and relatively simple compared 
to standard implementations of the CVM method (TJ, ll20l . 
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